Lending Club Case Study¶

by Ankit Kumar Surana¶

Introduction¶

As an employee of a consumer finance company specializing in lending various types of loans to urban clients, part of my responsibility involves facilitating loan approval decision-making. This entails evaluating application profiles and identifying potential risks associated with loan repayment. To accomplish this task, I need to analyze the data provided in "loan.csv", which contains historical information about past loan applicants along with their default status. The goal is to identify patterns that indicate the likelihood of an applicant defaulting, enabling us to take appropriate actions such as denying a loan, adjusting loan terms, or applying higher interest rates to risky applicants.¶
Through this analysis, my objective is to gain insights into the consumer and loan attributes that influence the likelihood of default, as well as to identify the key driving factors or variables behind loan defaults. By understanding these factors, the company can improve its portfolio management and risk assessment strategies.¶

Problem Statement¶

The aim is to identify patterns which indicate if a person is likely to default, which may be used for taking actions such as denying the loan, reducing the amount of loan, lending (to risky applicants) at a higher interest rate, etc.¶
When a person applies for a loan, there are two types of decisions that could be taken by the company:¶
1. Loan accepted: If the company approves the loan, there are 3 possible scenarios described below:¶
- Fully paid: Applicant has fully paid the loan (the principal and the interest rate)¶
- Current: Applicant is in the process of paying the instalments, i.e. the tenure of the loan is not yet completed. These candidates are not labelled as 'defaulted'.¶
- Charged-off: Applicant has not paid the instalments in due time for a long period of time, i.e. he/she has defaulted on the loan¶
2. Loan rejected: The company had rejected the loan (because the candidate does not meet their requirements etc.). Since the loan was rejected, there is no transactional history of those applicants with the company and so this data is not available with the company (and thus in this dataset)¶

DataSet¶

The dataset is provided in a CSV file named "loan.csv", and the corresponding data dictionary is available in "Data_Dictionary.xlsx".¶

Preliminary Wrangling¶

Importing essential libraries and configuring settings for data analysis and visualization¶

Gathering¶

Number of Rows in loan.csv :  39717 
Number of columns in loan.csv :  111
 DataSet : Loan.csv 

Out[5]:
id member_id loan_amnt funded_amnt funded_amnt_inv term int_rate installment grade sub_grade emp_title emp_length home_ownership annual_inc verification_status issue_d loan_status pymnt_plan url desc purpose title zip_code addr_state dti delinq_2yrs earliest_cr_line inq_last_6mths mths_since_last_delinq mths_since_last_record open_acc pub_rec revol_bal revol_util total_acc initial_list_status out_prncp out_prncp_inv total_pymnt total_pymnt_inv total_rec_prncp total_rec_int total_rec_late_fee recoveries collection_recovery_fee last_pymnt_d last_pymnt_amnt next_pymnt_d last_credit_pull_d collections_12_mths_ex_med mths_since_last_major_derog policy_code application_type annual_inc_joint dti_joint verification_status_joint acc_now_delinq tot_coll_amt tot_cur_bal open_acc_6m open_il_6m open_il_12m open_il_24m mths_since_rcnt_il total_bal_il il_util open_rv_12m open_rv_24m max_bal_bc all_util total_rev_hi_lim inq_fi total_cu_tl inq_last_12m acc_open_past_24mths avg_cur_bal bc_open_to_buy bc_util chargeoff_within_12_mths delinq_amnt mo_sin_old_il_acct mo_sin_old_rev_tl_op mo_sin_rcnt_rev_tl_op mo_sin_rcnt_tl mort_acc mths_since_recent_bc mths_since_recent_bc_dlq mths_since_recent_inq mths_since_recent_revol_delinq num_accts_ever_120_pd num_actv_bc_tl num_actv_rev_tl num_bc_sats num_bc_tl num_il_tl num_op_rev_tl num_rev_accts num_rev_tl_bal_gt_0 num_sats num_tl_120dpd_2m num_tl_30dpd num_tl_90g_dpd_24m num_tl_op_past_12m pct_tl_nvr_dlq percent_bc_gt_75 pub_rec_bankruptcies tax_liens tot_hi_cred_lim total_bal_ex_mort total_bc_limit total_il_high_credit_limit
0 1077501 1296599 5000 5000 4975.0 36 months 10.65% 162.87 B B2 NaN 10+ years RENT 24000.0 Verified Dec-11 Fully Paid n https://lendingclub.com/browse/loanDetail.action?loan_id=1077501 Borrower added on 12/22/11 > I need to upgrade my business technologies.<br> credit_card Computer 860xx AZ 27.65 0 Jan-85 1 NaN NaN 3 0 13648 83.70% 9 f 0.0 0.0 5863.155187 5833.84 5000.0 863.16 0.0 0.0 0.0 Jan-15 171.62 NaN May-16 0.0 NaN 1 INDIVIDUAL NaN NaN NaN 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.0 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.0 0.0 NaN NaN NaN NaN
Number of Rows in Data_Dictionary.xlsx :  115 
Number of columns in Data_Dictionary.xlsx :  2
 DataSet : Data_Dictionary.xlsx 

Out[7]:
LoanStatNew Description
0 acc_now_delinq The number of accounts on which the borrower is now delinquent.

Observation¶

The dataset comprises roughly 39,717 loan records and encompasses about 111 columns, which include both consumer and loan data. Furthermore, the data dictionary contains approximately 115 entries and is structured with 2 columns: one for the column names and another for their respective descriptions.¶

Assessing¶

Observation¶

1. The dataset does not contain any duplicate rows.¶
2. Approximately 54 columns have all NULL values.¶
3. About 9 columns have the same values across all records.¶
4. The dataset comprises approximately 7 categorical variables: 'term', 'grade', 'sub_grade', 'verification_status', 'loan_status', 'purpose', and 'home_ownership'.¶

Cleaning¶

1) Remove all the columns that are being used post loan approval.¶
2) Identify and remove rows where the loan status == "Current".¶
3) Remove the columns with all NULL or NaN values.¶
4) Remove the textual or masked columns that is irrelevant to analysis.¶
5) Remove all columns that have identical values across all rows.¶
6) Cleanse the data within columns containing '%' symbols.¶
7) Strip the alphabets from the sub-grade column.¶
8) Standardize the values within the emp_length column.¶
9) Round-off the amount value/ interest rate to the nearest two decimal places.¶
10) Convert the data-type of date columns to appropriate date formats.¶
11) Convert the cleaned '%' data in columns to float data type.¶
12) Convert the data type of columns having categorical value to categorical data types.¶
13) Decompose the date columns into smaller units like month and year.¶
14) Derive categorical variables from the loan_amnt and int_rate.¶
15) Rename the columns for clarity by using full terms instead of abbreviations.¶
16) Address the missing values in the dataset through imputation or deletion.¶
17) Handle the outliers in the data.¶
Number of Rows in df_clean after Cleaning :  37485 
Number of columns in df_clean after Cleaning:  23

Below is the data dictionary for the remaining columns on which we will conduct the analysis.¶

LoanStatNew Description
0 addr_state The state provided by the borrower in the loan application
1 annual_inc The self-reported annual income provided by the borrower during registration.
2 dti A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower’s self-reported monthly income.
3 emp_length Employment length in years. Possible values are between 0 and 10 where 0 means less than one year and 10 means ten or more years.
4 funded_amnt The total amount committed to that loan at that point in time.
5 funded_amnt_inv The total amount committed by investors for that loan at that point in time.
6 grade LC assigned loan grade
7 home_ownership The home ownership status provided by the borrower during registration. Our values are: RENT, OWN, MORTGAGE, OTHER.
8 installment The monthly payment owed by the borrower if the loan originates.
9 int_rate Interest Rate on the loan
10 issue_d The month which the loan was funded
11 loan_amnt The listed amount of the loan applied for by the borrower. If at some point in time, the credit department reduces the loan amount, then it will be reflected in this value.
12 loan_status Current status of the loan
13 pub_rec_bankruptcies Number of public record bankruptcies
14 purpose A category provided by the borrower for the loan request.
15 sub_grade LC assigned loan subgrade
16 term The number of payments on the loan. Values are in months and can be either 36 or 60.
17 verification_status Indicates if income was verified by LC, not verified, or if the income source was verified

Below is the segregation of Customer and Loan attributes post Data Assessment and Cleaning¶

Customer Attributes¶

1. annual_inc → Float Data Type¶
2. debt_to_income → Float Data Type¶
3. pub_rec_bankruptcies → Float Data Type¶
4. home_ownership → Categorical Data Type¶
5. addr_state → String Data Type¶
6. emp_length → Categorical Data Type¶

Loan Attributes¶

1. term → Categorical Data Type¶
2. issue_d → DateTime Data Type¶
3. grade → Categorical Data Type¶
4. sub_grade → Categorical Data Type¶
5. verification_status → Categorical Data Type¶
6. loan_status → Categorical Data Type¶
7. purpose → Categorical Data Type¶
8. loan_amnt → Float Data Type¶
9. funded_amnt → Float Data Type¶
10. funded_amnt_inv → Float Data Type¶
11. int_rate → Float Data Type¶
12. installment → Float Data Type¶

Derived Attributes¶

1. issue_d_year → Integer Data Type¶
2. issue_d_month → Categorical Data Type¶
3. loan_amnt_b → Categorical Data Type¶
4. debt_to_income_b → Categorical Data Type¶
5. int_rate_b → Categorical Data Type¶

Observation:¶

Following the cleaning process, the dataset now contains 37,170 records and 23 columns. This reflects a reduction of approximately 6.41% in the number of records and an 74% reduction in the number of columns with 5 new derived columns. Subsequently, we will employ this refined dataset for our Exploratory Data Analysis (EDA), encompassing univariate, segmented univariate, bivariate, and multivariate analyses.¶

Exploratory Data Analisys¶

Univariate Analysis¶

→ Mean, Median, Max, Min, Std, Variance, Count¶
→ Distribution ( Histogram, CountPlot, BoxPlot)¶

Bivariate Analysis¶

→ Relationship Between 2 Variables ( ScatterPlot, BoxPlot, BarPlot etc)¶

Multivariate Analysis¶

→ Relationship Between more variables ( Heatmap etc.)¶
Numerical_Columns : ['loan_amnt', 'funded_amnt', 'funded_amnt_inv', 'int_rate', 'installment', 'annual_inc', 'debt_to_income', 'pub_rec_bankruptcies', 'issue_d_year']

Cateogrical_Columns : ['term', 'grade', 'sub_grade', 'emp_length', 'home_ownership', 'verification_status', 'loan_status', 'issue_d_month', 'loan_amnt_b', 'int_rate_b', 'debt_to_income_b']

Extra_Columns : ['issue_d', 'purpose', 'addr_state']

Univariate Exploration¶

Initiating detailed analysis of loan_amnt...

Statistical summary for loan_amnt:
count    37485.000000
mean     10678.593437
std       6884.877579
min        500.000000
25%       5000.000000
50%       9500.000000
75%      15000.000000
max      35000.000000
Name: loan_amnt, dtype: float64

The mode of loan_amnt is: 10000.0

No description has been provided for this image
No description has been provided for this image

Observation:¶

The distribution depicted above is right-skewed, indicating that the majority of loan application amounts fell between 5000-10000, followed by 0-5000 and then 10000-15000. However, the mean loan amount is 10678 and the mode is 10000.¶
Initiating detailed analysis of annual_inc...

Statistical summary for annual_inc:
count     37485.00000
mean      63517.37955
std       32070.74777
min        4000.00000
25%       40000.00000
50%       57000.00000
75%       80000.00000
max      189000.00000
Name: annual_inc, dtype: float64

The mode of annual_inc is: 60000.0

No description has been provided for this image
No description has been provided for this image

Observation:¶

The above distribution is right skewed and most of the loan application where from customers whose annual income lies between 40000-60000. The mean of annual income of the customers is 63517 and the mode is 60000.¶
Initiating detailed analysis of int_rate...

Statistical summary for int_rate:
count    37485.000000
mean        11.878059
std          3.661481
min          5.420000
25%          8.940000
50%         11.710000
75%         14.270000
max         24.400000
Name: int_rate, dtype: float64

The mode of int_rate is: 10.99

No description has been provided for this image
No description has been provided for this image

Observation:¶

The above distribution is non-symmetric bimodal and most of the loans had an interest rate between 11%-12%, followed by 10%-11% and then 7%-8%. Also there is a sudden drop in loans where the interest rate is 7%-9%.¶
Initiating detailed analysis of issue_d_year...

Statistical summary for issue_d_year:
count    37485.000000
mean      2010.303188
std          0.883444
min       2007.000000
25%       2010.000000
50%       2011.000000
75%       2011.000000
max       2011.000000
Name: issue_d_year, dtype: float64

The mode of issue_d_year is: 2011

Count summary for issue_d_year:
issue_d_year
2011    19775
2010    11318
2009     4614
2008     1538
2007      240
Name: count, dtype: int64
No description has been provided for this image

Observation:¶

The distribution shown above exhibits left skewness, indicating that the majority of loans were issued in the year 2011, while the least number of loans were issued in 2007.¶
Initiating detailed analysis of term...

Statistical summary for term:
count          37485
unique             2
top        36 months
freq           28417
Name: term, dtype: object

The mode of term is:  36 months

Count summary for term:
term
 36 months    28417
 60 months     9068
Name: count, dtype: int64
No description has been provided for this image
Initiating detailed analysis of grade...

Statistical summary for grade:
count     37485
unique        7
top           B
freq      11394
Name: grade, dtype: object

The mode of grade is: B

Count summary for grade:
grade
B    11394
A     9873
C     7615
D     4924
E     2508
F      900
G      271
Name: count, dtype: int64
No description has been provided for this image
Initiating detailed analysis of sub_grade...

Statistical summary for sub_grade:
count     37485
unique        5
top           4
freq       7865
Name: sub_grade, dtype: object

The mode of sub_grade is: 4

Count summary for sub_grade:
sub_grade
4    7865
3    7743
5    7598
2    7434
1    6845
Name: count, dtype: int64
No description has been provided for this image
Initiating detailed analysis of emp_length...

Statistical summary for emp_length:
count     37485
unique       11
top         10 
freq       9131
Name: emp_length, dtype: object

The mode of emp_length is: 10 

Count summary for emp_length:
emp_length
10     9131
0      4414
2      4184
3      3924
4      3247
5      3113
1      3110
6      2114
7      1671
8      1377
9      1200
Name: count, dtype: int64
No description has been provided for this image
Initiating detailed analysis of home_ownership...

Statistical summary for home_ownership:
count     37485
unique        5
top        RENT
freq      18240
Name: home_ownership, dtype: object

The mode of home_ownership is: RENT

Count summary for home_ownership:
home_ownership
RENT        18240
MORTGAGE    16241
OWN          2906
OTHER          95
NONE            3
Name: count, dtype: int64
No description has been provided for this image
Initiating detailed analysis of verification_status...

Statistical summary for verification_status:
count            37485
unique               3
top       Not Verified
freq             16528
Name: verification_status, dtype: object

The mode of verification_status is: Not Verified

Count summary for verification_status:
verification_status
Not Verified       16528
Verified           11558
Source Verified     9399
Name: count, dtype: int64
No description has been provided for this image
Initiating detailed analysis of loan_status...

Statistical summary for loan_status:
count          37485
unique             2
top       Fully Paid
freq           32026
Name: loan_status, dtype: object

The mode of loan_status is: Fully Paid

Count summary for loan_status:
loan_status
Fully Paid     32026
Charged Off     5459
Name: count, dtype: int64
No description has been provided for this image
Initiating detailed analysis of issue_d_month...

Statistical summary for issue_d_month:
count        37485
unique          12
top       December
freq          4101
Name: issue_d_month, dtype: object

The mode of issue_d_month is: December

Count summary for issue_d_month:
issue_d_month
December     4101
November     3876
October      3613
September    3390
August       3285
July         3272
June         3093
May          2848
April        2749
March        2621
January      2332
February     2305
Name: count, dtype: int64
No description has been provided for this image
Initiating detailed analysis of loan_amnt_b...

Statistical summary for loan_amnt_b:
count          37485
unique             7
top       5000-10000
freq           12677
Name: loan_amnt_b, dtype: object

The mode of loan_amnt_b is: 5000-10000

Count summary for loan_amnt_b:
loan_amnt_b
5000-10000     12677
0-5000          9411
10000-15000     7553
15000-20000     4228
20000-25000     2657
25000-30000      643
30000-35000      316
Name: count, dtype: int64
No description has been provided for this image
Initiating detailed analysis of int_rate_b...

Statistical summary for int_rate_b:
count     37485
unique       11
top       10-12
freq       8146
Name: int_rate_b, dtype: object

The mode of int_rate_b is: 10-12

Count summary for int_rate_b:
int_rate_b
10-12    8146
12-14    7073
6-8      6759
14-16    5096
8-10     4003
16-18    3037
18-20    1362
4-6      1335
20-22     546
22-24     124
24-26       4
0-2         0
2-4         0
Name: count, dtype: int64
No description has been provided for this image
Initiating detailed analysis of debt_to_income_b...

Statistical summary for debt_to_income_b:
count     37485
unique       15
top       12-14
freq       3809
Name: debt_to_income_b, dtype: object

The mode of debt_to_income_b is: 12-14

Count summary for debt_to_income_b:
debt_to_income_b
12-14    3809
14-16    3805
10-12    3621
16-18    3472
8-10     3363
18-20    3321
6-8      2903
20-22    2900
22-24    2555
4-6      2490
2-4      1865
0-2      1728
24-26    1188
26-28     277
28-30     188
Name: count, dtype: int64
No description has been provided for this image

Observation:¶

1. debt_to_income_b: It is a normally distributed graph with peak between 12 and 14 which indicates that the most of customers had a debt-to-income ratio within this range.¶
2. loan_amount_b: The graph indicates a right-skewed distribution with the peak indicating that most of loans are for amounts between 5,000 and 10,000.¶
3. int_rate_b: The distribution indicates that most of loans are between 10%-12%.¶
4. issue_b_month: The distribution is left-skewed, which indicates that most loan applications were issued in December,and the lowest in February.¶
5. loan_status: There are around 32,026 customers who have their loan status as fully paid while around 5,459 customers have loan status as charged-off.¶
6. verification_status: Around 16,528 customers who have been issued loans arent verified.¶
7. home_ownership: The majority of customers (around 18,240) have a home ownership status of "RENT," followed by 16,241 with "MORTGAGE," and only 2,906 who own their home.¶
8. emp_length: Customers with over 10 years of employment have applied for the most loans and that is around 9,131 applications.¶
9. sub_grade: Loans issued under subgrade 4 are the most common, while subgrade 1 has the least.¶
10. grade: The highest number of loans are issued under grade B (around 11,394), while the lowest is for grade G.¶
11. term: The majority of loans have a term of 36 months.¶

Segmented Univariate Exploration¶

Segmenting the loan status into 'fully_paid' and 'charged_off' and analyzing the variables that impact the loan status.¶
Initiating detailed analysis of int_rate...

Statistical summary for int_rate:
count    32026.000000
mean        11.561643
std          3.575163
min          5.420000
25%          8.490000
50%         11.490000
75%         13.980000
max         24.110000
Name: int_rate, dtype: float64

The mode of int_rate is: 10.99

No description has been provided for this image
No description has been provided for this image
Initiating detailed analysis of int_rate...

Statistical summary for int_rate:
count    5459.000000
mean       13.734362
std         3.611106
min         5.420000
25%        11.140000
50%        13.490000
75%        16.320000
max        24.400000
Name: int_rate, dtype: float64

The mode of int_rate is: 11.49

No description has been provided for this image
No description has been provided for this image

Observations¶

Interest Rate (int_rate): Loans tend to be fully paid off more frequently when the interest rate falls between approximately 9.4% and 11.0%. Conversely, there is a higher likelihood of default when interest rates are approximately between 12.5% and 14.2%.¶
Initiating detailed analysis of annual_inc...

Statistical summary for annual_inc:
count     32026.000000
mean      64411.371832
std       32263.098470
min        4000.000000
25%       40500.000000
50%       58000.000000
75%       80000.000000
max      189000.000000
Name: annual_inc, dtype: float64

The mode of annual_inc is: 60000.0

No description has been provided for this image
No description has been provided for this image
Initiating detailed analysis of annual_inc...

Statistical summary for annual_inc:
count      5459.000000
mean      58272.646664
std       30395.899329
min        4080.000000
25%       36500.000000
50%       52000.000000
75%       72000.000000
max      187000.000000
Name: annual_inc, dtype: float64

The mode of annual_inc is: 60000.0

No description has been provided for this image
No description has been provided for this image

Observations¶

Annual Income (annual_inc): Loans are most frequently fully paid when the annual income is between 45K-60K. In contrast, there are more charged-off loans when the annual income is between 30K-45K.¶
Initiating detailed analysis of loan_amnt...

Statistical summary for loan_amnt:
count    32026.000000
mean     10520.196871
std       6760.853423
min        500.000000
25%       5000.000000
50%       9000.000000
75%      14500.000000
max      35000.000000
Name: loan_amnt, dtype: float64

The mode of loan_amnt is: 10000.0

No description has been provided for this image
Initiating detailed analysis of loan_amnt...

Statistical summary for loan_amnt:
count     5459.000000
mean     11607.849423
std       7505.301274
min        900.000000
25%       5500.000000
50%      10000.000000
75%      16000.000000
max      35000.000000
Name: loan_amnt, dtype: float64

The mode of loan_amnt is: 10000.0

No description has been provided for this image

Observations¶

Loan Amount (loan_amount): Charge-offs are notably high for loan amounts between 5K-10K. However, further analysis is needed to understand the distribution of fully paid loans within this range.¶
Initiating detailed analysis of int_rate_b...

Statistical summary for int_rate_b:
count     32026
unique       11
top       10-12
freq       7097
Name: int_rate_b, dtype: object

The mode of int_rate_b is: 10-12

No description has been provided for this image
Initiating detailed analysis of int_rate_b...

Statistical summary for int_rate_b:
count      5459
unique       11
top       12-14
freq       1141
Name: int_rate_b, dtype: object

The mode of int_rate_b is: 12-14

No description has been provided for this image
Initiating detailed analysis of loan_amnt_b...

Statistical summary for loan_amnt_b:
count          32026
unique             7
top       5000-10000
freq           11046
Name: loan_amnt_b, dtype: object

The mode of loan_amnt_b is: 5000-10000

No description has been provided for this image
Initiating detailed analysis of loan_amnt_b...

Statistical summary for loan_amnt_b:
count           5459
unique             7
top       5000-10000
freq            1631
Name: loan_amnt_b, dtype: object

The mode of loan_amnt_b is: 5000-10000

No description has been provided for this image
Initiating detailed analysis of grade...

Statistical summary for grade:
count     32026
unique        7
top           B
freq       9998
Name: grade, dtype: object

The mode of grade is: B

No description has been provided for this image
Initiating detailed analysis of grade...

Statistical summary for grade:
count     5459
unique       7
top          B
freq      1396
Name: grade, dtype: object

The mode of grade is: B

No description has been provided for this image
Initiating detailed analysis of emp_length...

Statistical summary for emp_length:
count     32026
unique       11
top         10 
freq       7644
Name: emp_length, dtype: object

The mode of emp_length is: 10 

No description has been provided for this image
Initiating detailed analysis of emp_length...

Statistical summary for emp_length:
count     5459
unique      11
top        10 
freq      1487
Name: emp_length, dtype: object

The mode of emp_length is: 10 

No description has been provided for this image
Initiating detailed analysis of verification_status...

Statistical summary for verification_status:
count            32026
unique               3
top       Not Verified
freq             14407
Name: verification_status, dtype: object

The mode of verification_status is: Not Verified

No description has been provided for this image
Initiating detailed analysis of verification_status...

Statistical summary for verification_status:
count             5459
unique               3
top       Not Verified
freq              2121
Name: verification_status, dtype: object

The mode of verification_status is: Not Verified

No description has been provided for this image
Initiating detailed analysis of home_ownership...

Statistical summary for home_ownership:
count     32026
unique        5
top        RENT
freq      15454
Name: home_ownership, dtype: object

The mode of home_ownership is: RENT

No description has been provided for this image
Initiating detailed analysis of home_ownership...

Statistical summary for home_ownership:
count     5459
unique       4
top       RENT
freq      2786
Name: home_ownership, dtype: object

The mode of home_ownership is: RENT

No description has been provided for this image

Observations¶

1. Interest Rate (int_rate): Loans which have interest rates between 10%-12% have a higher chance of loan status being fully paid off. However, loans having interest rates between 12%-14% show a higher likelihood of charge-off.¶
2. Annual Income (annual_inc): Loans Status fully paid when the customer's annual income is between 45K-60K. In contrast, the customers having an annual income between 30K-45K tend to have more charged-off loans.¶
3. Grade: Borrowers with 10 or more years of employment have the highest count of loans and also is the highest in fully paid and charged-off segments. This indicates that they take more loans and are also prone to default more.¶
4. Employment Length (emp_length): Borrowers having 10 or more years of employment have the highest loan counts in fully paid and default segments. This indicates they take more loans and also default more.¶
5. Verification Status: Loans to borrowers who are not verified show higher peaks in being paid off compared to those who are verified. This indicates that unverified borrowers tend to repay their loans more frequently.¶
6. Home Ownership: Renters have a higher default rate, which might be explained by their higher expenses towards rent.¶

Bivariate Exploration¶

Bivariate Analysis - Numerical vs Numerical¶

No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Observations :¶

1. Loan Amount vs. Interest Rate:¶
- Slight positive correlation: Higher loan amounts correlate with slightly lower interest rates.¶
2. Loan Amount vs. Installment:¶
- Positive correlation: Larger loans entail higher installment payments.¶
3. Loan Amount vs. Annual Income:¶
- Weak positive correlation: As annual income increases, loan amount tends to rise, albeit weakly.¶
4. Loan Amount vs. Public Record Bankruptcies:¶
- Negative correlation: Public record bankruptcies minimally affect the loan amount qualification.¶
5. Annual Income vs. Interest Rate:¶
- Weak negative correlation: Drawing conclusions from this scatter plot alone is challenging due to the weak correlation.¶
6. Annual Income vs. Debt-to-Income Ratio:¶
- Negative correlation: As annual income increases, the debt-to-income ratio tends to decrease.¶

Bivariate Analysis - Categorical vs Numerical¶

No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Observations:¶

1. Term vs Loan Amount:¶
Loans with a 60-month term have higher average amounts as compared to those with a 36-month term.¶
2. Grade vs Loan Amount:¶
- Median of the loan amounts increases with higher grades (A to G), that indicates a very strong positive correlation.¶
- Grades C, D, and E exhibit wider variety in loan amounts when compared to other grades.¶
4. Employment Length vs. Loan Amount:¶
There is no clear correlation between employment length and the loan amount. Variety in loan amounts differs across employment lengths.¶
4. Loan Status vs. Loan Amount:¶
Median of the loan amounts is almost same for charged-off and fully paid loans, but charged-off loans exhibit greater variety.¶
5. Loan Status vs. Interest Rate:¶
Charged-off loans have higher median interest rates than the fully paid loans, thereby suggesting a potential correlation between interest rate and loan default.¶
7. Grade vs. Interest Rate:¶
- Median of the interest rate increases with lower loan grades (A to G), indicating that riskier loans carry higher rates.¶
- Some grades (e.g., B, C, D, and E) show wider variety in interest rates.¶
7. Verification Status vs. Loan Amount:¶
There is no much difference in median income of verified and unverified customers. Unverified applicants show wider variety in income.¶
9. Home Ownership vs. Loan Amount:¶
Median of the loan amounts is high for customers with mortgages as compared to renters and others, suggesting a correlation between homeownership and the loan amount.¶

Bivariate Analysis - Categorical vs Categorical¶

Category: term

No description has been provided for this image
term loan_status 36 months 60 months
0 Charged Off 11.15 25.25
1 Fully Paid 88.85 74.75
Category: loan_amnt_b

No description has been provided for this image
loan_amnt_b loan_status 0-5000 5000-10000 10000-15000 15000-20000 20000-25000 25000-30000 30000-35000
0 Charged Off 13.91 12.87 13.88 17.43 19.5 21.31 25.0
1 Fully Paid 86.09 87.13 86.12 82.57 80.5 78.69 75.0
Category: int_rate_b

No description has been provided for this image
int_rate_b loan_status 0-2 2-4 4-6 6-8 8-10 10-12 12-14 14-16 16-18 18-20 20-22 22-24 24-26
0 Charged Off NaN NaN 4.12 5.65 9.59 12.88 16.13 19.35 26.11 30.4 36.26 44.35 50.0
1 Fully Paid NaN NaN 95.88 94.35 90.41 87.12 83.87 80.65 73.89 69.6 63.74 55.65 50.0
Category: grade

No description has been provided for this image
grade loan_status A B C D E F G
0 Charged Off 6.07 12.25 17.35 22.22 26.71 32.11 33.21
1 Fully Paid 93.93 87.75 82.65 77.78 73.29 67.89 66.79
Category: home_ownership

No description has been provided for this image
home_ownership loan_status MORTGAGE NONE OTHER OWN RENT
0 Charged Off 13.68 0.0 17.89 14.97 15.27
1 Fully Paid 86.32 100.0 82.11 85.03 84.73
Category: purpose

No description has been provided for this image
purpose loan_status car credit_card debt_consolidation educational home_improvement house major_purchase medical moving other renewable_energy small_business vacation wedding
0 Charged Off 10.82 10.76 15.28 16.93 12.21 15.76 10.43 15.81 15.96 16.22 18.56 27.3 14.25 10.35
1 Fully Paid 89.18 89.24 84.72 83.07 87.79 84.24 89.57 84.19 84.04 83.78 81.44 72.7 85.75 89.65

Observations:¶

1. Comparison between Loan Status and Term:¶
Risk Levels (Charged Off Rates):¶
  • Less Risky: 36 months (11.15%)¶
  • More Risky: 60 months (25.25%)¶
Distribution of Loan Terms:¶
  • More Common: 36-month loans¶
  • Less Common: 60-month loans¶
2. Comparison between Loan Status and Loan Amount:¶
Risk Levels (Charged Off Rates):¶
  • Less Risky: 5000-10000 (12.87%), 0-5000 (13.91%), 10000-15000 (13.88%)¶
  • More Risky: 15000-20000 (17.43%), 20000-25000 (19.5%), 25000-30000 (21.31%)¶
Distribution of Loan Amounts:¶
  • More Prevalent: Loans in the 5000-10000 and 0-5000 ranges¶
  • Less Prevalent: Loans in the 25000-30000 and 20000-25000 ranges, indicating fewer larger loans.¶
3. Comparison between Loan Status and Interest Rate:¶
Risk Levels (Charged Off Rates):¶
  • Less Risky: 4-6% (4.12%), 6-8% (5.65%), 8-10% (9.59%)¶
  • More Risky: 16-18% (26.11%), 18-20% (30.4%), 20-22% (36.26%), 22-24% (44.35%)¶
Distribution of Interest Rates:¶
  • More Common: Loans with 10-12% and 6-8% interest rates¶
  • Less Common: Loans with 22-24% and 20-22% interest rates, indicating fewer high-risk, high-interest loans.¶
4. Comparison between Loan Status and Grade:¶
Risk Levels (Charged Off Rates):¶
  • Less Risky: Grade A (6.07%), B (12.25%), and C (17.35%)¶
  • More Risky: Grade D (22.22%), E (26.71%), F (32.11%), and G (33.21%)¶
Distribution of Loan Grades:¶
  • More Prevalent: Grades B and A¶
  • Less Prevalent: Grades F and G, indicating fewer high-risk loans.¶
5. Comparison between Loan Status and Home Ownership:¶
Risk Levels (Charged Off Rates):¶
  • More Risky: OTHER (17.89%), RENT (15.27%), OWN (14.97%)¶
  • Less Risky: MORTGAGE (13.68%), NONE (0.0%)¶
Distribution of Home Ownership:¶
  • High Prevalence: RENT and MORTGAGE are most common.¶
6. Comparison between Loan Status and Purpose:¶
Risk:¶
  • High Risk: Small business, renewable energy, educational loans.¶
  • Low Risk: Wedding, car, credit card, major purchase loans.¶
Demand:¶
  • High Demand: Debt consolidation, credit card loans.¶
  • Low Demand: Renewable energy, educational loans.¶

Multivariate Analysis¶

No description has been provided for this image

Observation:¶

1. Loan Amounts:¶
  • Strong positive correlations with funded amount and funded amount invested, moderate correlation with installment, and weak correlation with borrower characteristics and interest rates.¶
2. Funded Amount:¶
  • Strong positive correlations with loan amount and funded amount invested, moderate correlation with installment, and weak correlation with borrower characteristics and interest rates.¶
3. Funded Amount Invested:¶
  • Strong positive correlations with loan amount and funded amount, moderate correlation with installment, and weak correlation with borrower characteristics and interest rates.¶
4. Interest Rate:¶
  • Weak correlations with loan attributes, indicating some association but not particularly strong.¶
5. Installment:¶
  • Strong positive correlations with loan amount, funded amount, and funded amount invested, weak correlation with borrower characteristics and interest rates.¶
6. Annual Income:¶
  • Weak correlations with loan attributes, indicating a slight association with loan amounts but not particularly strong.¶
7. Debt-to-Income Ratio:¶
  • Weak correlations with loan attributes, suggesting some association but not particularly strong.¶
8. Public Record Bankruptcies:¶
  • Almost negligible correlation with loan attributes, indicating minimal association with borrower characteristics and loan terms.¶
9. Issue Date Year:¶
  • Negligible correlation with loan attributes except for a somewhat stronger correlation with funded amount invested.¶

Recommendations¶

Based on the above analysis, below are the factors that influence loan defaults:¶
  • Loan Term: Loans with longer terms (60 months) have higher default rates compared to shorter-term loans (36 months), indicating higher risk or borrower instability.¶
  • Loan Amount: For loan amounts exceeding $15,000, there is a trend of higher default rates. Borrowers may find it difficult to make repayments for larger loans, leading to increased charge-offs.¶
  • Interest Rate: Loans with higher interest rates, such as 16%, are associated with higher charge-off rates. High-interest loans may attract riskier customers.¶
  • Loan Grade: Lower-grade loans (D, E, F, G) have higher charge-off rates compared to higher-grade loans (A, B, C).¶
  • Home Ownership: Non-traditional home ownership, such as renting or other categories, is linked with higher charge-off rates compared to mortgage holders.¶
  • Loan Purpose: Loans for small businesses, renewable energy, and education exhibit higher charge-off rates, while loans for weddings, cars, credit cards, and major purchases have lower charge-off rates.¶
These factors provide insights into customer behavior that can help lenders understand and mitigate loan charge-off risks.¶